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The tremendous pandemic potential of coronaviruses was 
demonstrated twice in the past few decades by two global outbreaks 
of deadly pneumonia. Entry of coronaviruses into cells is mediated 
by the transmembrane spike glycoprotein S, which forms a trimer 
carrying receptor-binding and membrane fusion functions 1 . 
S also contains the principal antigenic determinants and is the 
target of neutralizing antibodies. Here we present the structure 

o 

of a mouse coronavirus S trimer ectodomain determined at 4.0 A 
resolution by single particle cryo-electron microscopy. It reveals 
the metastable pre-fusion architecture of S and highlights key 
interactions stabilizing it. The structure shares a common core with 
paramyxovirus F proteins 2 ’ 3 , implicating mechanistic similarities 
and an evolutionary connection between these viral fusion proteins. 
The accessibility of the highly conserved fusion peptide at the 
periphery of the trimer indicates potential vaccinology strategies 
to elicit broadly neutralizing antibodies against coronaviruses. 
Finally, comparison with crystal structures of human coronavirus 
S domains allows rationalization of the molecular basis for species 
specificity based on the use of spatially contiguous but distinct 
domains. 

Coronaviruses are enveloped viruses responsible for 30% of mild 
respiratory infections and atypical pneumonia in humans worldwide 4 . 
The emergence of the severe acute respiratory syndrome coronavirus 
(SARS-CoV) in 2002 and of the Middle East respiratory syndrome 
coronavirus (MERS-CoV) in 2012 demonstrated that these zoonotic 
viruses can transmit to humans from various animal species, and sug¬ 
gested that additional emergence events are likely to occur. The fatality 
rate of SARS-CoV and MERS-CoV infections are about 10-37% 1,4 and 
there are no approved antiviral treatments or vaccines. 

Coronaviruses use S homotrimers to promote cell attachment and 
fusion of the viral and host membranes. S determines host range, cell 
tropism and is the main target of neutralizing antibodies during infec¬ 
tion 1 . S is a class I viral fusion protein synthesized as a single chain 
precursor of about 1,300 amino acids that trimerizes upon folding. 
It is composed of an amino-terminal Si subunit, containing the 
receptor-binding domain, and a carboxy-terminal S 2 subunit, driv¬ 
ing membrane fusion. Cleavage by furin-like host proteases at the 
junction between Si and S 2 (S 2 cleavage site) occurs during biogenesis 
for some coronaviruses such as mouse hepatitis virus (MHV, the 
prototypical and best-studied coronavirus) 1,5 . The Si and S 2 subunits 
remain non-covalently associated in the metastable pre-fusion 
S trimer. After virion uptake by target cells, a second cleavage is mediated 
by endo-lysosomal proteases (S 2 ' cleavage site), allowing fusion activation 
of coronavirus S proteins 6 . 

Crystal structures of coronavirus S post-fusion cores demonstrated 
that the fusogenic conformational changes lead to the formation of 
a so-called trimer of hairpins that is the hallmark of class I fusion 
proteins 7-10 . These structures contain two heptad-repeat (HR) regions 


present in S 2 assembled as an extended triple helical coiled-coil motif 
(HR1) surrounded by three shorter helices (HR2). Crystal structures of 
several coronavirus S receptor-binding domains in complex with their 
cognate receptors have also been reported 11-14 . Finally, cryo-electron 
microscopy (cryoEM) of SARS-CoV virions provided a snapshot of 
the S glycoprotein at 16 A resolution 15 . The lack of high-resolution data 
for any coronavirus S trimer has prevented a detailed analysis of the 
infection mechanisms. 

We produced an MHV S ectodomain trimer with enhanced stability 
by mutating the S 2 cleavage site and fusing a GCN4 trimerization motif 
at the C-terminal end of the construct. The resulting MHV S ecto¬ 
domain forms a trimer binding with high-affinity to the soluble mouse 
CEACAMla receptor (Extended Data Fig. la, b). We used state-of-the 
art cryoEM 16 to determine the structure of the MHV S ectodomain 

o 

trimer at 4.0 A resolution (Fig. la-c and Extended Data Figs 2 and 3). 
We fitted the crystal structures of two Si domains 11,13,17 and built 
de novo the rest of the polypeptide chain using Coot 18 and Rosetta 19,20 
(Fig. ld-f, Extended Data Figs 2-4 and Supplementary Tables 1 and 2). 
The final model includes residues 15 to 1118, with an internal break 
corresponding to a loop immediately upstream from the S 2 7 cleavage 
site (residues 827-863). The region connecting the Si and S 2 subunits 
(residues 718-754) features weak density that correlates with its accessi¬ 
bility for proteolytic cleavage in vivo. Residues 453-535 were modelled 
by density-guided homology modelling using Rosetta owing to the 
poor quality of the density in this region (Extended Data Fig. 3k). 

o 

The MHV S ectodomain is a 140 A long trimer with a triangular 

o 

cross-section varying in diameter from 70 A, at the membrane proximal 

o 

base, to 140 A at the membrane distal end (Fig. Id, e). The structure 
comprises two functional subunits (Fig. 2a-d): a distal moiety con¬ 
stituted by the Si subunits; and a central stem connecting to the viral 
membrane formed by the S 2 subunits. 

The Si subunit has a ‘V’ shape contributing to the overall triangular 
appearance of the S trimer (Extended Data Fig. 5a). The Si N-terminal 
moiety comprises domain A, which is folded as a galectin-like (3-sandwich 
decorated with extended loops on the viral membrane distal side, and 
a three-stranded antiparallel (3-sheet plus an a-helix on the viral mem¬ 
brane proximal side. The Si C-terminal half folds as three spatially 
distinct (3-rich domains, termed B, C and D (Fig. 2a-d). 

The S 2 subunit connects to the viral membrane and is characterized 
by the presence of long a-helices (Figs 2b-d and 3a). A central helix 

o 

(a 30 ) stretches 75 A along the three-fold molecular axis towards the 
viral membrane (Fig. 3a). It is located immediately downstream of the 
HR1 motif, which folds as four consecutive a-helices (a 26 -a 29 ; Fig. 3a 

o 

and Extended Data Fig. 6a, b), in sharp contrast to the 120-A long 
HR1 helix observed in the post-fusion S structures 7-9 (Extended Data 

o 

Fig. 6c-e). The 55-A-long upstream helix (a 20 ), so named because it 
is located immediately upstream of the S 2 ' cleavage site, runs parallel 
to and is zipped against the central helix via hydrophobic contacts 
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Figure 1 | 3D reconstruction of the MHV S trimer determined 
by single-particle cryoEM. a-c, 3D map filtered at 4.0 A 
resolution coloured by protomer. Two different views of the 
S trimer (from the side (a) and from the top, looking towards the 
viral membrane (b)), and a side view of one S protomer (c) are 
shown, d-f, Ribbon diagrams showing the MHV S atomic model 
oriented as in a-c. 
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largely following a heptad-repeat pattern. A core antiparallel (3-sheet 
((3 46 -(349-(3 5 o) is present at the viral membrane proximal end and is 
assembled from an N-terminal (3-strand ((3 46 ), preceding the upstream 
helix, and a C-terminal (3-hairpin ((3 49 —(3 50 )> located downstream of 
the central helix. 

MHV S 2 features a topology similar to the paramyxovirus F proteins 
(such as respiratory syncytial virus (RSV) F: root mean squared devi- 

o 

ation (r.m.s.d.) 4 A over 125 residues), with a comparable 3D organ¬ 
ization of the core (3-sheet, the upstream helix and the central helix 
(Fig. 3a, b). Importantly, these motifs were shown to remain invariant 
in the pre- and post-fusion F structures 2,3 . The conservation of these 
motifs among coronavirus S and paramyxovirus F proteins suggests that 
these fusion proteins have evolved from a distant common ancestor. 
Although the density is too weak to trace the polypeptide chain down¬ 
stream from (3 5 o, secondary structure predictions suggest that the domain 
directly preceding HR2 could adopt a similar fold in coronavirus S 
and paramyxovirus F proteins. 

In the S trimer, the three central helices are packed via their cen¬ 
tral portions whereas the two ends splay away from the three-fold 
axis (Extended Data Fig. 7a-c). Additional contacts between the 


upstream and central helices participate to inter-protomer interac¬ 
tions. Furthermore, the Si subunits interlock to form a crown around 
the S 2 trimer stabilizing it in the pre-fusion conformation (Fig. 3c, d 
and Supplementary Table 3). This is illustrated by the large surface 
area buried at the interface between each Si subunit and the S 2 subu- 
nits of the three protomers (1,970 A ). Many of these contacts involve 
the HR1 helices and the fusion peptide region. These polypeptide seg¬ 
ments undergo major refolding during the fusogenic conformational 
changes (Extended Data Fig. 6a-e), which supports the notion that 
the Si subunits maintain the S 2 fusion machinery in its metastable 
state. Substitutions of the conserved alanine 994 by valine in helix 
a 28 or of the conserved leucine 1062 by phenylalanine in the cen¬ 
tral helix were shown to attenuate fusogenicity 21,22 . Our structure 
suggests that the former substitution would strengthen hydrophobic 
packing against the core (3-sheet (Extended Data Fig. 7b), and that 
the later substitution could reinforce molecular stapling of the cen¬ 
tral helices (Extended Data Fig. 7a, c). The expected modification 
of the energy landscape between pre-fusion and post-fusion con¬ 
formations would explain the reduction in fusion activity of these 
mutants 21,22 . 
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Figure 2 | Architecture of the MHV S protomer. 

a, Schematic diagram of the S glycoprotein 
organization. Black and grey dashed lines denote 
regions unresolved in the reconstruction and 
regions that were not part of the construct, 
respectively. BH, (3-hairpin ((3 49 —(3 50 ); CH, central 
helix; CT, cytoplasmic tail; FP, fusion peptide; 
HR1/HR2, heptad-repeats; TM, transmembrane 
domain; UH, upstream helix, b-d, Ribbon 
diagrams depicting three views of the S protomer 
coloured as in a. Asterisk denotes the MHV S 
receptor-binding region. Disulfide bonds are 
shown as green sticks except for residues 453-535, 
for which they are not shown. 
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Figure 3 | Pre-fusion structure of the coronavirus 
fusion machinery, a, b, Topology and ribbon 
diagrams showing the structural similarity 
between coronavirus MHV S 2 (starting at residue 
755) (a) and paramyxovirus RSV F (PDB 5C6B) 

(b). For clarity, only part of RSV F is shown, with 
conserved secondary structural elements coloured 
identically as for MHV S 2 . ‘#’ denotes motifs 
participating to the post-fusion HR1 coiled-coil. 
c, d, Two different views of the MHV S trimer 
(from the side (c) and top, looking towards the 
host cell membrane (d)) highlighting how Si 
(ribbon diagram and semi-transparent surface) 
wraps around the S 2 fusion machinery (ribbon 
diagram) to stabilize it. 


The predicted fusion peptide includes the C-terminal half of helix 
a 2 i and extends up to the N-terminal half of a 22 (refs 6 and 23) (Fig. 2c). 
a 2 i is an amphipathic helix located at the periphery of the S trimer, 
burying hydrophobic side chains towards the S 2 centre and exposing 
charged residues to solvent (Fig. 2c and Extended Data Fig. 7b, c). In 
the case of porcine epidemic diarrhoea coronavirus, trypsin processing 
at the S 2 ' site can only occur after host cell attachment 24 . This indicates 
that receptor binding could allosterically increase the accessibility of 
the S 2 ' site, which is located within helix a 2i . The acidic pH of the endo- 
lysosomes could also contribute to exposing the S 2 ' cleavage site for 
coronaviruses requiring cleavage in this compartment. The fact that 
helix a 2 i appears dynamic and is found immediately downstream from 


a disordered loop suggests that it could undergo considerable ‘breathing 
motions. Regardless of the mechanism promoting cleavage, the MHV S 
structure reported here explains the requirement for processing at the 
S 2 ' site, as it frees the fusion peptide from the S 2 N-terminal region, 
which is a prerequisite for its insertion ~200 A away in the target mem¬ 
brane. The peripheral position of the fusion peptide is similar to what 
has been observed in the parainfluenza virus 5 F 3 and HIV gp41 (ref. 25) 
prefusion structures (Extended Data Fig. 8a-c). The notable accessibility 
of the fusion peptide and its sequence conservation among corona- 
viruses 6,23 suggest that it would be an ideal target for epitope-focused 
vaccinology initiatives aimed at raising broadly neutralizing antibodies 
against S glycoproteins (Fig. 4a-c and Extended Data Fig. 9). Major 
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Figure 4 | Potential strategy for neutralizing coronavirus infections. 

a, Surface representation of the MHV S trimer coloured according to 
sequence conservation using the alignment presented in Extended 
Data Fig. 9. The fusion peptide sequence is highly conserved among 
coronavirus S proteins, b, Surface representation of the MHV S trimer 


highlighting the peripheral position of the fusion peptide (blue and 
cyan), c, Ribbon diagrams of the MHV S trimer showing the overlapping 
positions of the fusion peptide (residues 870-887, blue and cyan) and of a 
major antigenic determinant identified for MHV and SARS-CoV (residues 
875-905, magenta spheres). 
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antigenic determinants (inducing neutralizing antibodies) of MHV 
and SARS-CoV S proteins overlap with the fusion peptide region and 
support the suitability of this approach 26,27 . Antibodies binding to this 
site will not only hinder insertion of the fusion peptide into the target 
membrane, but will also putatively prevent fusogenic conformational 
changes. This epitope-focused strategy has proven successful to obtain 
neutralizing antibodies against RSV F 28 . 

The spatial proximity of domains A and B in the S trimer allows ration¬ 
alization of their alternative use among coronaviruses to interact with 
host receptors. MHV uses the viral membrane distal loops decorating 
domain A to interact with CEACAMla (ref. 13), whereas MERS-CoV 
and SARS-CoV rely on the (3-motif protruding from domain B to bind 
to DPP4 (ref. 11) or ACE2 (refs 12 and 14), respectively (Extended 
Data Fig. 5a-d). The poor sequence conservation of the B domain 
(3-motif among coronavirus S proteins, its considerable length variation 
among MHV strains (Extended Data Fig. 9) and our density-guided 
homology model of this motif indicate structural and functional differ¬ 
ences. These structural variations constitute the molecular basis under¬ 
lying coronavirus species specificity and cell tropism using a single S 
architectural scaffold. 

Sequence comparisons indicate that the MHV spike Si and S 2 sub¬ 
units respectively share ~25% and ^40% sequence similarity with 
many other coronavirus S proteins (Extended Data Fig. 9). Therefore, 
the structure reported here is representative of the architecture of other 
coronavirus S such as those of MERS-CoV and SARS-CoV. This hypoth¬ 
esis is further supported by the structural similarity of (1) the MHV 13 
and bovine coronavirus 17 A domains; (2) the MHV, MERS-CoV 1 2 3 4 5 6 7 8 9 10 11 , 
SARS-CoV 12 and HKU4 (ref. 29) B domains (Extended Data Fig. 10); 

(3) the post-fusion cores of MHV 7 , SARS-CoV 8,10 and MERS-CoV 9 ; and 

(4) the isolation of infectious coronaviruses featuring a deletion of the 
A domain and using domain B as the receptor-binding domain 30 . Our 
results now provide a framework to understand coronavirus entry and 
suggest ways for preventing or treating future coronavirus outbreaks. 

Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

No statistical methods were used to predetermine sample size. 

Plasmids. A human codon-optimized gene encoding the MHV spike gene 
(UniProt: PI 1224) was synthesized with an Arg717Ser amino acid mutation 
to abolish the furin cleavage site at the S 1 -S 2 junction (S 2 cleavage site). From 
this gene, the fragment encoding the MHV ectodomain (residues 15-1231) was 
PCR-amplified and ligated to a gene fragment encoding a GCN4 trimerization 
motif (IKRMKQIEDKIEEIESKQKKIENEIARIKKIK) 3,31 , a thrombin cleavage 
site (LVPRGSLE), an 8-residue long Strep-Tag (WSHPQFEK) and a stop codon. 
This construct results in fusing the GCN4 trimerization motif in register with the 
HR2 helix at the C-terminal end of the MHV S-encoding sequence. This gene was 
cloned into the pMT/BiP/V5/His expression vector (Invitrogen) in frame with 
the Drosophila BiP secretion signal downstream the metallothionein promoter. 
The D1 domain of mouse CEACAMla (residues 35-142; gb NP_001034274.1) 
was amplified by PCR and cloned into a mammalian expression plasmid 32 , in 
frame with a CD5 signal sequence at the 5' end, and with a sequence encoding a 
thrombin cleavage site, a glycine linker and the Fc domain of human IgGl at the 
3' end, creating the pCD5-MHVR-T-Fc vector. 

Production of recombinant CEACAMla ectodomain by transient transfection. 

293-F cells were grown in suspension using FreeStyle 293 Expression Medium 
(Life technologies) at 37 °C in a humidified 5% CO 2 incubator on a Celltron shaker 
platform (Infors HT) rotating at 130r.p.m. (for 11 culture flasks). Twenty-four 
hours before transfection, cell density was adjusted at 1.5 x 10 6 cells ml -1 , and 
culture grown overnight in the same conditions as mentioned above to reach 
~2.5 x 10 6 cells ml -1 the day of transfection. Cells were collected by centrifugation 
at 1,250 r.p.m. for 5 min, and resuspended in fresh FreeStyle 293 Expression 
Medium (Life technologies) without antibiotics at a density of 2.5 x 10 6 cells ml -1 . 

To produce recombinant CEACAMla ectodomain, 400 pg of pCD5-MHVR- 
T-Fc vector (purified using EndoFree plasmid kit from Qiagen) were added to 
200 ml of suspension cells. The cultures were swirled for 5 min on shaker in the 
culture incubator before adding 9 pg ml -1 of Linear polyethylenimine (PEI) solu¬ 
tion (25 kDa, Polysciences). Twenty-four hours after transfection, cells were diluted 
1:1 with FreeStyle 293 Expression Medium and the transfected cells were cultivated 
for 6 days. Clarified cell supernatants were concentrated tenfold using Vivaflow 
tangential filtration cassettes (Sartorius, 10-kDa cut-off) before affinity purification 
using a Protein A column (GE LifeSciences) followed by gel filtration chromatog¬ 
raphy using a Superdex 200 10/300 GL column (GE Life Sciences) equilibrated in 
20 mM Tris-HCl, pH 7.5,100 mM NaCl. The Fc tag was removed by trypsin cleav¬ 
age in a reaction mixture containing 7 mg of recombinant CEACAMla ectodomain 
and 5 pg of trypsin in 100 mM Tris-HCl, pH 8.0 and 20 mM CaCb. The reaction 
mixture was incubated at 25 °C overnight and re-loaded in a Protein A column to 
remove uncleaved protein and the Fc tag. The cleaved protein was further purified 
by gel filtration using a Superdex 75 column 10/300 GL (GE Life Sciences) equili¬ 
brated in 20 mM Tris-HCl, pH 7.5,100 mM NaCl. The purified protein was quan¬ 
tified using absorption at 280 nm and concentrated to approximately 10 mg ml -1 . 
Production of recombinant MHV S ectodomain in Drosophila S2 cells. To 
generate a stable Drosophila S2 cell line expressing recombinant MHV spike ecto¬ 
domain, we used Effectene (Qiagen) and 2 pg of the plasmid encoding the MHV 
S protein ectodomain. A second plasmid, encoding blasticidin S deaminase was 
cotransfected as dominant selectable marker. Stable MHV S ectodomain expressing 
cell lines were selected by addition of lOpg ml -1 blasticidin S (Invivogen) to the 
culture medium 48 h after transfection. 

For large-scale production of MHV S ectodomain the cells were cultured 
in spinner flasks and induced by 5 pM CdCb at a density of approximately 10 7 
cells per ml. After a week at 28 °C, clarified cell supernatants were concentrated 
40-fold using Vivaflow tangential filtration cassettes (Sartorius, 10-kDa cut-off) 
and adjusted to pH 8.0, before affinity purification using StrepTactin Superflow 
column (IBA) followed by gel filtration chromatography using Superose 6 10/300 
GL column (GE Life Sciences) equilibrated in 20 mM Tris-HCl, pH 7.5,100 mM 
NaCl. The purified protein was quantified using absorption at 280 nm and con¬ 
centrated to approximately 4 mg ml -1 . 

SEC-MALS. For size exclusion chromatography coupled with multi-angle light 
scattering (SEC-MALS) analysis, samples (0.2 ml at 1 mg ml -1 ) were loaded onto 
a Superdex 200 10/300 GL column (GE Life Sciences, 0.4 ml min -1 in gel filtration 
buffer) and passed through a Wyatt DAWN Heleos II EOS 18-angle laser photom¬ 
eter coupled to a Wyatt Optilab TrEX differential refractive index detector. Data 
were analysed using Astra 6 software (Wyatt Technology Corp). 

MicroScale Thermophoresis. Solution MicroScale Thermophoresis (MST) 
binding studies were performed using standard protocols on a Monolith NT.l 15 
(Nanotemper Technologies). In brief, recombinant CEACAMla ectodomain 
protein was labelled using the RED-NHS (Amine Reactive) Protein Labelling 
Kit (Nanotemper Technologies). The MHV S ectodomain protein was serially 


diluted in 20 mM Tris-HCl, pH 7.5,100 mM NaCl and the labelled recombinant 
CEACAMla was added to a final concentration of 500nM before overnight 
incubation at 4°C. The CEACAMla concentration was chosen such that the 
observed fluorescence was approximately 1,000 U at 40% LED power. The samples 
were loaded into standard-treated Monolith capillaries and were measured by 
standard protocols using a Monolith NT. 115, NanoTemper. The changes in the 
fluorescent thermophoresis signal were plotted against the concentration of 
the serially diluted MHV spike protein, and IQ values were determined using the 
NanoTemper analysis software. 

CryoEM sample preparation and data collection. Three microlitres of MHV 
spike at 1.85 mg ml -1 was applied to a 1.2/1.3 C-flat grid (Protochips), which had 
been glow-discharged for 30 s at 20 mA. Thereafter, grids were plunge-frozen in 
liquid ethane using a Gatan CP3 and a blotting time of 3.5 s. Data were acquired 
using an FEI Titan Krios transmission electron microscope operated at 300 kV 
and equipped with a Gatan K2 Summit direct detector. Coma-free alignment was 
performed using the Leginon software 33 . Automated data collection was carried out 
using Leginon 34 to control both the FEI Titan Krios (used in microprobe mode at a 
nominal magnification of22,500 x) and the Gatan K2 Summit operated in counted 
mode (pixel size: 1.315 A) at a dose rate of ~9 counts per physical pixel per s, which 
corresponds to <~12 electrons per physical pixels per s (when accounting for coinci¬ 
dence loss 35 ). Each video had a total accumulated exposure of 53 e AT 2 fractionated 
in 38 frames of200 ms (yielding movies of 7.6 s). A data set of ~1,600 micrographs 
was acquired in a single session using a defocus range of between 2.0 and 5.0 pm. 
CryoEM data processing. Whole-frame alignment was carried out using the soft¬ 
ware developed previously 35 , which is integrated into the Appion pipeline 36 , to 
account for stage drift and beam-induced motion. The parameters of the micro¬ 
scope contrast transfer function were estimated for each micrograph using ctffind3 
(ref. 37). Micrographs were manually masked using Appion to exclude the visible 
carbon supporting film for further processing. Particles were automatically picked 
in a reference-free manner using DogPicker 38 . Extraction of particle images was 
performed using Relion 1.4 with a box size of 320 pixels 2 and applying a windowing 
operation in Fourier space to yield a final box size of 288 pixels 2 (corresponding 
to a pixel size of 1.46 A). From the 1.2 million particles initially picked, a subset of 
50,000 particles were randomly selected to generate class averages using RELION 39 . 
An initial 3D model was generated using OPTIMOD 40 within the Appion pipeline. 
The entire data set was subjected to 2D alignment and clustering using RELION 
and particles belonging to the best-defined class averages were retained (~500,000 
particles). These ~500,000 particles were then subjected to RELION 3D classi¬ 
fication with four classes (using cl symmetry) starting with our initial model 
low-pass filtered to 40 A resolution. We subsequently used the ^230,000 best 
particles (selected from the 3D classification) and the map corresponding to the 
best 3D class (low-pass filtered at 40 A resolution) to run Relion 3D auto-refine (c3 
symmetry), which led to a reconstruction at 4.4 A resolution. We used the particle 
polishing procedure in RELION 1.4 to correct for individual particle movement 
and radiation damage 41,42 . A second round of 3D classification with 6 classes 
(c3 symmetry) was performed using the polished particles resulting in the selection 
of 82,000 particles. A new 3D auto-refine run (c3 symmetry) using the selected 
82,000 particles and the map corresponding to the best 3D class (low-pass filtered 
at 40 A resolution) yielded a map at 4.0 A resolution following post-processing in 
RELION. The final map was sharpened with an empirically determined B factor 
of —220 A 2 using Relion post processing. Reported resolutions are based on the 
gold-standard Fourier shell correlation (FSC) = 0.143 criterion 43 , and Fourier shell 
correction curves were corrected for the effects of soft masking by high-resolution 
noise substitution 44 . The soft mask used for FSC calculation had a 10 pixel cosine 
edge fall-off. The overall shape and dimensions of our reconstruction agree 
with previous data although the HR2 stem connecting to the membrane is not 
resolved 15 . 

Model building and analysis. Fitting of atomic models into cryoEM maps was 
performed using UCSF Chimera 45 and Coot 18,46 . We initially docked the MHV 
domain A structure (PDB 3R4D) and used a crystal structure of a bovine corona- 
virus domain A (PDB 4H14) to model the three-stranded (3-sheet and the a-helix 
present on the viral membrane proximal side of the galectin-like domain. Next, the 
MERS-CoV domain B crystal structure (PDB 4KQZ) was also fit into the density, 
and rebuilt and refined using RosettaCM 47 . Although we could accurately align 
the sequences corresponding to the core (3-sheet of the MHV and MERS-CoV B 
domains, the ~100 residues forming the (3-motif extension (residues 453-535, 
MERS-C 0 V/SARS-C 0 V receptor-binding moiety) could not be aligned with con¬ 
fidence. We used RosettaCM to build models of each of the 945 possible disulfide 
patterns into the density for domain B. For each disulfide arrangement, 50 models 
were generated, and there was a very clear energy signal for a single such arrange¬ 
ment (Extended Data Fig. 3k). Then, 1,000 models with this disulfide arrange¬ 
ment were sampled, and the lowest energy model (using the Rosetta force field 
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augmented with a fit-to-density score term) was selected. Owing to the poor quality 
of the reconstruction at the apex of the S trimer, the confidence of the model is 
lowest for the segment corresponding to residues 453-535, as homology modelling 
was used to fill in details missing in the map. 

A backbone model was then manually built for the rest of the S polypeptide 
using Coot. Sequence register was assigned by visual inspection where side chain 
density was clearly visible. This initial hand built model was used as an initial 
model for Rosetta de novo 20 . The Rosetta-derived model largely agreed with the 
hand-built model. Rosetta de novo successfully identified fragments allowing to 
anchor the sequence register for domains C and D as well as for helices 0121 - 0 . 25 . 
Given these anchoring positions, RosettaCM 47 augmented with a novel density- 
guided model-growing protocol was able to rebuild domains C and D in full. The 
final model was refined by applying strict non-crystallographic symmetry con¬ 
straints using Rosetta 19 . Model refinement was performed using a training map 
corresponding to one of the two maps generated by the gold-standard refinement 
procedure in Relion. The second map (testing map) was used only for calculation 
of the FSC compared to the atomic model and preventing overfitting 48 . The quality 
of the final model was analysed with Molprobity 49 . Structure analysis was assisted 
by the PISA 50 and DALI 51 servers. The sequence alignment was generated using 
MultAlin 52 and coloured with ESPript 53 . All figures were generated with UCSF 
Chimera 45 . 
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Extended Data Figure 1 | Biophysical characterization of the MHV 
S ectodomain. a, The MHV S molecular mass was determined to be 
463.2 ± 0.3 kDa (mean =t s.e.m.) (corresponding to a trimer) using 
size-exclusion chromatography coupled in-line with multi-angle light 
scattering and refractometry. The blue line represents the normalized 
refractive index (right ordinate axis) and the red line shows the estimated 


molecular mass (expressed in Da, left ordinate axis), b, MHV S binds with 
high-affinity to the soluble mouse CEACAMla receptor. Thermophoresis 
signal plotted against the MHV S concentration. The dissociation constant 
(iCd) was determined to be 48.5 ± 3.8 nM. Values correspond to the average 
of two independent experiments. The concentration of CEACAMla used 
was 500 nM. 


© 2016 Macmillan Publishers Limited. All rights reserved 














RESEARCH 


LETTER 




Extended Data Figure 2 | CryoEM analysis of the MHV S trimer. 

a, b, Representative electron micrograph (defocus: 4.6 pm) (a) and class 
averages (b) of the MHV S trimer embedded in vitreous ice. Scale bars: 

573 A (micrograph) and 44 A (class averages), c, Gold-standard (blue) and 


model/map (red) Fourier shell correlation (FSC) curves. The resolution 
was determined to 4.0 A. The 0.143 and 0.5 cut-off values are indicated by 
horizontal grey bars. 


© 2016 Macmillan Publishers Limited. All rights reserved 





































LETTER 


RESEARCH 


a 


b 






disulfide 
bond 






k 


- 2,800 


Disulfide arrangement (sorted by energy) 

5 10 15 20 25 30 35 

i_i_i_i_i_i_i 


> 

U) 

i_ 

Q) 

C 

d) 

> 


- 2,820 

- 2,840 

- 2,860 


</> 

C 

0> 

T3 

+ 

(0 

ss 

<D 

(/) 

o 

DU 


- 2,880 

- 2,900 

- 2,920 

- 2,940 

- 2,960 



- 2,980 


Resolution (A) 


Extended Data Figure 3 | CryoEM density for selected regions of the 
MHV S reconstruction, local resolution analysis and density-guided 
homology modelling of residues 453-535. The atomic model is shown 
with the corresponding region of the map. a, b, Upstream helix, 
c-e, Helix belonging to domain A (residues 284-296). f-h, Core [3-sheet, 
i, j, CryoEM density corresponding to the MHV S trimer (i) and a single 


protomer (j), coloured according to local resolution determined with the 
software Resmap. We interpret Resmap results as a qualitative (rather than 
quantitative) estimate of map quality, k, Rebuilding of the MHV S domain 
B using RosettaCM. Plot showing the energy mean and s.d. of the models 
corresponding to the 30 lowest energy disulfide arrangements (out of 945) 
for domain B. 
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Extended Data Figure 4 | Refinement and model statistics. 
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Extended Data Figure 5 | Structural organization of the Si subunit. 

a, Ribbon diagram showing a single Si protomer. b, Close-up view of the 
MHV S domain B. The structural motif used as a receptor-interacting 
moiety by MERS-CoV and SARS-CoV is indicated. The density was too 


weak to allow tracing of this segment (residues 453-535), which has been 
traced by density-guided homology modelling using Rosetta, c, d, Ribbon 
diagrams of the Si trimer viewed from the side (c) and from the top 
(looking towards the viral membrane) (d). 
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Extended Data Figure 6 | Mechanisms of membrane fusion promoted 
by coronavirus S glycoproteins, a, Ribbon diagram of the MHV S 2 pre¬ 
fusion structure. Disulfide bonds are shown as green sticks, b, Topology 
diagram of the MHV S 2 pre-fusion structure. PP, di-proline that will act as 
a helix breaker. The presence of these di-proline motifs indicates that the 
post-fusion HR1 coiled-coil could not extend up to the fusion peptide as 
a single helix. This hypothesis is further supported by the observation of 
a conserved disulfide bond formed between residues Cys894 and Cys905 
(labelled 14 in a and b), which will prevent refolding of helices 0^2 and 


a 23 as a single extended helix, c, Ribbon diagram of the SARS-CoV post¬ 
fusion HR1 helix obtained by X-ray crystallography (PDB 1WYY). The 
residue numbers corresponding to the MHV A59 sequence are indicated, 
d, Topology diagram showing the expected coronavirus S post-fusion 
conformation derived from our MHV S structure and the SARS-CoV 
post-fusion core crystal structure shown in c. e, Ribbon diagram of a 
model of the MHV S 2 post-fusion conformation. Residues belonging 
to 0121 , cv, 22 ) ot 23 , fW cv ,24 and a 25 are not represented owing to a lack of 
structural information. 
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Extended Data Figure 7 | Structural organization of the S 2 fusion machinery, a, Ribbon diagram of the trimer of central helices, b, c, Ribbon diagrams 
of the S 2 trimer (starting at residue 755) viewed from the side (b) and from the bottom (looking towards the host cell membrane) (c). Residues Ala994 
and Leu 1062, which are discussed in the text, are shown in stick format. 
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Extended Data Figure 8 | Class I viral fusion proteins with exposed fusion peptide, a, MHV S (residues 870-887). b, Parainfluenza virus 5 F (PIV5 F, 
residues 103-128, PDB 2B9B). c, HIV-1 gp41 (residues 518-528, PDB 4TVP). The trimeric fusion proteins are shown as grey ribbon diagrams with the 
fusion peptides rendered in magenta. 
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Extended Data Figure 9 | See next 


page for figure caption. 
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Extended Data Figure 9 | Sequence conservation among coronavirus S 
glycoproteins, a, Sequence alignment of coronavirus S proteins. Bovine- 
CoV, bovine respiratory coronavirus AH187 (gi 253756585); HKU1, 
human coronavirus HKU1 (gi 545299280); HKU4, tylonycteris bat 
coronavirus HKU4 (gi 126030114); HKU5, pipistrellus bat coronavirus 
HKU5 (gi 126030124); MERS-CoV, Middle East respiratory syndrome 
coronavirus (gi 836600681); MHV-A59, mouse hepatitis virus A59 
(gi 1352862); MHV-JHM, mouse hepatitis virus JHM (gi 60115395); 
MHV-2, mouse hepatitis virus 2 (gi 5565844); OC43, human coronavirus 
OC43 (gi 744516696); SARS-CoV, severe acute respiratory syndrome 
coronavirus ZJ01 (gi 39980889); Waterbuck-CoV, waterbuck coronavirus 


US/OH-WD358-TC/1994 (gi 215478096). Asparagine residues featuring 
N-linked glycan chains visible in the MHV S reconstruction are indicated 
with a star. The S 2 and S 2 cleavage sites are indicated with scissors at 
positions corresponding to the MHV S sequence. Cysteine residues 
involved in the formation of disulfide bonds are numbered according to 
Supplementary Table 2. The secondary structure elements observed in our 
MHV S reconstruction are indicated above the sequence. The black dotted 
lines above the sequence indicate regions poorly defined in the density. 
Although the viral membrane distal loops of the A domains are weakly 
defined in the density, the availability of a crystal structure of this domain 
from the same virus (PDB 3R4D) helped with the modelling. 
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Extended Data Figure 10 | Structural similarity of B domains among coronavirus S glycoproteins, a, MHV (pink), b, MERS-CoV (orange, PDB 
4KQZ). c, SARS-CoV (red, PDB 2AJF). d, HKU4 (blue, PDB 4QZV). 
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